focal loss
Calibrating Deep Neural Networks using Focal Loss
Miscalibration -- a mismatch between a model's confidence and its correctness -- of Deep Neural Networks (DNNs) makes their predictions hard to rely on. Ideally, we want networks to be accurate, calibrated and confident. We show that, as opposed to the standard cross-entropy loss, focal loss (Lin et al., 2017) allows us to learn models that are already very well calibrated. When combined with temperature scaling, whilst preserving accuracy, it yields state-of-the-art calibrated models. We provide a thorough analysis of the factors causing miscalibration, and use the insights we glean from this to justify the empirically excellent performance of focal loss. To facilitate the use of focal loss in practice, we also provide a principled approach to automatically select the hyperparameter involved in the loss function. We perform extensive experiments on a variety of computer vision and NLP datasets, and with a wide variety of network architectures, and show that our approach achieves state-of-the-art calibration without compromising on accuracy in almost all cases.
AdaFocal: Calibration-aware Adaptive Focal Loss
Much recent work has been devoted to the problem of ensuring that a neural network's confidence scores match the true probability of being correct, i.e. the calibration problem. Of note, it was found that training with focal loss leads to better calibration than cross-entropy while achieving similar level of accuracy \cite{mukhoti2020}. This success stems from focal loss regularizing the entropy of the model's prediction (controlled by the parameter $\gamma$), thereby reining in the model's overconfidence. Further improvement is expected if $\gamma$ is selected independently for each training sample (Sample-Dependent Focal Loss (FLSD-53) \cite{mukhoti2020}). However, FLSD-53 is based on heuristics and does not generalize well. In this paper, we propose a calibration-aware adaptive focal loss called AdaFocal that utilizes the calibration properties of focal (and inverse-focal) loss and adaptively modifies $\gamma_t$ for different groups of samples based on $\gamma_{t-1}$ from the previous step and the knowledge of model's under/over-confidence on the validation set. We evaluate AdaFocal on various image recognition and one NLP task, covering a wide variety of network architectures, to confirm the improvement in calibration while achieving similar levels of accuracy. Additionally, we show that models trained with AdaFocal achieve a significant boost in out-of-distribution detection.
- North America > United States (0.04)
- Europe > Italy (0.04)
- Asia > Japan (0.04)
- Transportation > Ground > Road (1.00)
- Information Technology (1.00)
Hybrid Convolution Neural Network Integrated with Pseudo-Newton Boosting for Lumbar Spine Degeneration Detection
V, Pandiyaraju, Karthik, Abishek, K, Jaspin, A, Kannan, Lloret, Jaime
This paper proposes a new enhanced model architecture to perform classification of lumbar spine degeneration with DICOM images while using a hybrid approach, integrating EfficientNet and VGG19 together with custom-designed components. The proposed model is differentiated from traditional transfer learning methods as it incorporates a Pseudo-Newton Boosting layer along with a Sparsity-Induced Feature Reduction Layer that forms a multi-tiered framework, further improving feature selection and representation. The Pseudo-Newton Boosting layer makes smart variations of feature weights, with more detailed anatomical features, which are mostly left out in a transfer learning setup. In addition, the Sparsity-Induced Layer removes redundancy for learned features, producing lean yet robust representations for pathology in the lumbar spine. This architecture is novel as it overcomes the constraints in the traditional transfer learning approach, especially in the high-dimensional context of medical images, and achieves a significant performance boost, reaching a precision of 0.9, recall of 0.861, F1 score of 0.88, loss of 0.18, and an accuracy of 88.1%, compared to the baseline model, EfficientNet. This work will present the architectures, preprocessing pipeline, and experimental results. The results contribute to the development of automated diagnostic tools for medical images.
- Asia > Singapore (0.04)
- Asia > India > Tamil Nadu > Chennai (0.04)
- North America (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- North America > Canada (0.04)
- Research Report (0.68)
- Overview (0.68)
Appendix for Learning to Predict Trustworthiness with Steep Slope Loss Y an Luo
By Hoeffding's bound, we have null The ViT (i.e., ViT Base/16) used in this work is implemented in the ASYML project The code is implemented in Python 3.8.5 with PyTorch 1.7.1 [ For the other experiments or analyses, we run one time. The implementation provides the pre-trained models on MNIST and CIFAR-10. License, while the implementation of ViT is licensed under the Apache-2.0 Ideally, we hope that all the confidences w.r.t. the positive class are on the right-hand side of the positive threshold while the ones w.r.t. the negative class are on the left-hand side of the negative The oracles that are used to generate the confidences are the ones used in Table 1. ImageNet validation set (stylized val) and the adversarial ImageNet validation set (adversarial val).
- North America > United States > Minnesota (0.04)
- Asia > Singapore (0.04)
- North America > Canada > Ontario > Toronto (0.14)
- Asia > Singapore (0.04)
- North America > United States > Minnesota (0.04)
- Education (0.93)
- Health & Medicine (0.68)
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
- Asia > China > Chongqing Province > Chongqing (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)